Several researchers have contemplated deep learning-based post-filters to increase the\nquality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech\nto the natural speech, considering the different parameters separately and trying to reduce the\ngap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied\nsuccessfully in this purpose, but there are still many aspects to improve in the results and in the\nprocess itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective\nof enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient\nmanner. Our approach begins with an auto-associative training of one LSTM network, which is used\nas an initialization for the post-filters. We show the advantages of this initialization for the enhancing\nof the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization\nsucceeds in achieving better results in enhancing the statistical parametric speech spectrum in most\ncases when compared to the common random initialization approach of the networks.
Loading....